TasNet: time-domain audio separation network for real-time, single-channel speech separation
نویسندگان
چکیده
Robust speech processing in multi-talker environments requires effective speech separation. Recent deep learning systems have made significant progress toward solving this problem, yet it remains challenging particularly in real-time, short latency applications. Most methods attempt to construct a mask for each source in time-frequency representation of the mixture signal which is not necessarily an optimal representation for speech separation. In addition, time-frequency decomposition results in inherent problems such as phase/magnitude decoupling and long time window which is required to achieve sufficient frequency resolution. We propose Time-domain Audio Separation Network (TasNet) to overcome these limitations. We directly model the signal in the time-domain using encoder-decoder framework and perform the source separation on nonnegative encoder outputs. This method removes the frequency decomposition step and reduces the separation problem to estimation of source masks on encoder outputs which is then synthesized by the decoder. Our system outperforms the current state-of-the-art causal speech separation algorithms, reduces the computational cost of speech separation, and significantly reduces the minimum required latency of the output. This makes TasNet suitable for applications where low-power, real-time implementation is desirable such as in hearable and telecommunication devices.
منابع مشابه
Single Channel Audio Source Separation
-Blind source separation is an advanced statistical tool that has found widespread use in many signal processing applications. However, the crux topic based on one channel audio source separation has not fully developed to enable its way to laboratory implementation. The main idea approach to single channel blind source separation is based on exploiting the inherent time structure of sources kn...
متن کاملMulti-channel Algorithms for Improving Speech Recognition Accuracy in Adverse Environments
The focus of this thesis is on speech processing techniques for increasing robustness against additive noise and reverberation. In particular, single-channel and multi-channel algorithms will be presented. Among single-channel techniques, improvements of minimum mean squared error based approaches have been proposed. Techniques for gain function smoothing, and soft-decision originally proposed ...
متن کاملIntelligent Single-Channel Methods for Multi-Source Audio Analysis
This thesis investigates the potential of recent machine learning methods for the challenging task of single-channel, multi-source audio audio analysis, i.e., information extraction from single-channel audio where the sources of interest (e.g., speech) are mixed with multiple interfering sources. First, it is shown that source separation by recently proposed techniques for non-negative matrix f...
متن کاملBitwise Neural Networks for Efficient Single-Channel Source Separation
We present Bitwise Neural Networks (BNN) as an efficient hardware-friendly solution to single-channel source separation tasks in resource-constrained environments. In the proposed BNN system, we replace all the real-valued operations during the feedforward process of a Deep Neural Network (DNN) with bitwise arithmetic (e.g. the XNOR operation between bipolar binaries in place of multiplications...
متن کاملBlock Nonnegative Matrix Factorization for Single Channel Source Separation
Nonnegative Matrix Factorization (NMF) [1, 2] has been widely used in audio research, e.g. automatic music transcription [3], musical source separation [4], and speech enhancement [5]. The key strategy for applying NMF to audio-related tasks is to find a lower rank representation of the Short Time Fourier Transformed (STFT) input signal and use the basis vectors as dictionaries. For example, in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1711.00541 شماره
صفحات -
تاریخ انتشار 2017